Search CORE

5 research outputs found

MSF-Model: Modeling Metastable Failures in Replicated Storage Systems

Author: Habibi Farzad
Lorido-Botran Tania
Nawab Faisal
Showail Ahmad
Sturman Daniel C.
Publication venue
Publication date: 28/09/2023
Field of study

Metastable failure is a recent abstraction of a pattern of failures that occurs frequently in real-world distributed storage systems. In this paper, we propose a formal analysis and modeling of metastable failures in replicated storage systems. We focus on a foundational problem in distributed systems -- the problem of consensus -- to have an impact on a large class of systems. Our main contribution is the development of a queuing-based analytical model, MSF-Model, that can be used to characterize and predict metastable failures. MSF-Model integrates novel modeling concepts that allow modeling metastable failures which was interactable to model prior to our work. We also perform real experiments to reproduce and validate our model. Our real experiments show that MSF-Model predicts metastable failures with high accuracy by comparing the real experiment with the predictions from the queuing-based model

arXiv.org e-Print Archive

PlinyCompute: A Platform for High-Performance, Distributed, Data-Intensive Tool Development

Author: Barnett R. Matthew
Jermaine Chris
Lorido-Botran Tania
Luo Shangyu
Monroy Carlos
Sikdar Sourav
Teymourian Kia
Yuan Binhang
Zou Jia
Publication venue
Publication date: 01/01/2017
Field of study

This paper describes PlinyCompute, a system for development of high-performance, data-intensive, distributed computing tools and libraries. In the large, PlinyCompute presents the programmer with a very high-level, declarative interface, relying on automatic, relational-database style optimization to figure out how to stage distributed computations. However, in the small, PlinyCompute presents the capable systems programmer with a persistent object data model and API (the "PC object model") and associated memory management system that has been designed from the ground-up for high performance, distributed, data-intensive computing. This contrasts with most other Big Data systems, which are constructed on top of the Java Virtual Machine (JVM), and hence must at least partially cede performance-critical concerns such as memory management (including layout and de/allocation) and virtual method/function dispatch to the JVM. This hybrid approach---declarative in the large, trusting the programmer's ability to utilize PC object model efficiently in the small---results in a system that is ideal for the development of reusable, data-intensive tools and libraries. Through extensive benchmarking, we show that implementing complex objects manipulation and non-trivial, library-style computations on top of PlinyCompute can result in a speedup of 2x to more than 50x or more compared to equivalent implementations on Spark.Comment: 48 pages, including references and Appendi

arXiv.org e-Print Archive

Boston University Institutional Repository (OpenBU)

IPA: Inference Pipeline Adaptation to Achieve High Accuracy and Cost-Efficiency

Author: Doyle Joseph
Ghafouri Saeid
Jamshidi Pooyan
Lorido-Botran Tania
Razavi Kamran
Salmani Mehran
Sanaee Alireza
Wang Lin
Publication venue
Publication date: 24/08/2023
Field of study

Efficiently optimizing multi-model inference pipelines for fast, accurate, and cost-effective inference is a crucial challenge in ML production systems, given their tight end-to-end latency requirements. To simplify the exploration of the vast and intricate trade-off space of accuracy and cost in inference pipelines, providers frequently opt to consider one of them. However, the challenge lies in reconciling accuracy and cost trade-offs. To address this challenge and propose a solution to efficiently manage model variants in inference pipelines, we present IPA, an online deep-learning Inference Pipeline Adaptation system that efficiently leverages model variants for each deep learning task. Model variants are different versions of pre-trained models for the same deep learning task with variations in resource requirements, latency, and accuracy. IPA dynamically configures batch size, replication, and model variants to optimize accuracy, minimize costs, and meet user-defined latency SLAs using Integer Programming. It supports multi-objective settings for achieving different trade-offs between accuracy and cost objectives while remaining adaptable to varying workloads and dynamic traffic patterns. Extensive experiments on a Kubernetes implementation with five real-world inference pipelines demonstrate that IPA improves normalized accuracy by up to 35% with a minimal cost increase of less than 5%

arXiv.org e-Print Archive

Keep It Simple: Fault Tolerance Evaluation of Federated Learning with Unreliable Clients

Author: Anderson Chris
Botran Tania Lorido
Huang Victoria
Mayo Michael
Ooi Melanie
Rodrigues Mark
Sohail Shaleeza
Publication venue
Publication date: 16/05/2023
Field of study

Federated learning (FL), as an emerging artificial intelligence (AI) approach, enables decentralized model training across multiple devices without exposing their local training data. FL has been increasingly gaining popularity in both academia and industry. While research works have been proposed to improve the fault tolerance of FL, the real impact of unreliable devices (e.g., dropping out, misconfiguration, poor data quality) in real-world applications is not fully investigated. We carefully chose two representative, real-world classification problems with a limited numbers of clients to better analyze FL fault tolerance. Contrary to the intuition, simple FL algorithms can perform surprisingly well in the presence of unreliable clients

arXiv.org e-Print Archive

A Review of Auto-scaling Techniques for Elastic Applications in Cloud Environments

Author: B Urgaonkar
CZ Xu
DA Menasce
E Caron
E Casalicchio
HC Lim
J Guitart
Jose A. Lozano
Jose Miguel-Alonso
L Wang
M Mao
P Koperek
Q Zhu
R Prodan
S Islam
Tania Lorido-Botran
V Méndez Muñoz
W Iqbal
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref